Turkish abstractive text summarization using pretrained sequence-to-sequence models
نویسندگان
چکیده
Abstract The tremendous amount of increase in the number documents available on Web has turned finding relevant piece information into a challenging, tedious, and time-consuming activity. Accordingly, automatic text summarization become an important field study by gaining significant attention from researchers. Lately, with advances deep learning, neural abstractive sequence-to-sequence (Seq2Seq) models gained popularity. There have been many improvements these such as use pretrained language (e.g., GPT, BERT, XLM) Seq2Seq BART T5). These addressed certain shortcomings improved upon challenges saliency, fluency, semantics which enable generating higher quality summaries. Unfortunately, research attempts were mostly limited to English language. Monolingual BERT multilingual released recently providing opportunity utilize state-of-the-art low-resource languages Turkish. In this study, we make obtain results two large-scale Turkish datasets, TR-News MLSum, for task. Then, title datasets establish hard baselines generation task both datasets. We show that input substantial importance success tasks. Additionally, provide extensive analysis including cross-dataset evaluations, various options, effect preprocessing ROUGE evaluations It is shown monolingual outperform all tasks across Lastly, qualitative generated summaries titles are provided.
منابع مشابه
Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond
In this work, we model abstractive text summarization using Attentional EncoderDecoder Recurrent Neural Networks, and show that they achieve state-of-the-art performance on two different corpora. We propose several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentenc...
متن کاملSequence-to-Sequence RNNs for Text Summarization
In this work, we cast text summarization as a sequence-to-sequence problem and apply the attentional encoder-decoder RNN that has been shown to be successful for Machine Translation (Bahdanau et al. (2014)). Our experiments show that the proposed architecture significantly outperforms the state-of-the art model of Rush et al. (2015) on the Gigaword dataset without any additional tuning. We also...
متن کاملFramework for Abstractive Summarization using Text-to-Text Generation
We propose a new, ambitious framework for abstractive summarization, which aims at selecting the content of a summary not from sentences, but from an abstract representation of the source documents. This abstract representation relies on the concept of Information Items (INIT), which we define as the smallest element of coherent information in a text or a sentence. Our framework differs from pr...
متن کاملNeural Abstractive Text Summarization
Abstractive text summarization is a complex task whose goal is to generate a concise version of a text without necessarily reusing the sentences from the original source, but still preserving the meaning and the key contents. We address this issue by modeling the problem as a sequence to sequence learning and exploiting Recurrent Neural Networks (RNNs). This work is a discussion about our ongoi...
متن کاملText Generation for Abstractive Summarization
We have begun work on a framework for abstractive summarization and decided to focus on a module for text generation. For TAC 2010, we thus move away from sentence extraction. Each sentence in the summary we generate is based on a document sentence but it usually contains a smaller amount of information and uses fewer words. The system uses the output of a syntactic parser for a sentence and th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Natural Language Engineering
سال: 2022
ISSN: ['1469-8110', '1351-3249']
DOI: https://doi.org/10.1017/s1351324922000195